Hashtag Processing for Enhanced Clustering of Tweets

نویسندگان

  • Dagmar Gromann
  • Thierry Declerck
چکیده

Rich data provided by tweets have been analyzed, clustered, and explored in a variety of studies. Typically those studies focus on named entity recognition, entity linking, and entity disambiguation or clustering. Tweets and hashtags are generally analyzed on sentential or word level but not on a compositional level of concatenated words. We propose an approach for a closer analysis of compounds in hashtags, and in the long run also of other types of text sequences in tweets, in order to enhance the clustering of such text documents. Hashtags have been used before as primary topic indicators to cluster tweets, however, their segmentation and its effect on clustering results have not been investigated to the best of our knowledge. Our results with a standard dataset from the Text REtrieval Conference (TREC) show that segmented and harmonized hashtags positively impact effective clustering.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

2016 Olympic Games on Twitter: Sentiment Analysis of Sports Fans Tweets using Big Data Framework

Big data analytics is one of the most important subjects in computer science. Today, due to the increasing expansion of Web technology, a large amount of data is available to researchers. Extracting information from these data is one of the requirements for many organizations and business centers. In recent years, the massive amount of Twitter's social networking data has become a platform for ...

متن کامل

Realtime Ad Hoc Search in Twitter: Know-Center at TREC Microblog Track 2011

In this paper, we outline our experiments carried out at the TREC Microblog Track 2011. Our system is based on a plain text index extracted from Tweets crawled from twitter.com. This index has been used to retrieve candidate Tweets for the given topics. The resulting Tweets were post-processed and then analyzed using three different approaches: (i) a burst detection approach, (ii) a hashtag ana...

متن کامل

Effect of Spam on Hashtag Recommendation for Tweets

Presence of spam tweets in a datasetmay affect the choices of feature selection, algorithm formulation, and system evaluation for many applications. However, most existing studies have not considered the impact of spam tweets. In this paper, we study the impact of spam tweets on hashtag recommendation for hyperlinked tweets (i.e., tweets containing URLs) in HSpam14 dataset. HSpam14 is a collect...

متن کامل

Exploiting Topical Perceptions over Multi-Lingual Text for Hashtag Suggestion on Twitter

Microblogging websites, such as Twitter, provide seemingly endless amount of textual information on a wide variety of topics generated by a large number of users. Microblog posts, or tweets in Twitter, are often written in an informal manner using multi-lingual styles. Ignoring informal styles or multiple languages can hamper the usefulness of microblogging mining applications. In this paper, w...

متن کامل

The perfect solution for detecting sarcasm in tweets #not

To avoid a sarcastic message being understood in its unintended literal meaning, in microtexts such as messages on Twitter.com sarcasm is often explicitly marked with the hashtag ‘#sarcasm’. We collected a training corpus of about 78 thousand Dutch tweets with this hashtag. Assuming that the human labeling is correct (annotation of a sample indicates that about 85% of these tweets are indeed sa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017